Intra- and interpopulation genotype reconstruction from tagging SNPs.

نویسندگان

  • Peristera Paschou
  • Michael W Mahoney
  • Asif Javed
  • Judith R Kidd
  • Andrew J Pakstis
  • Sheng Gu
  • Kenneth K Kidd
  • Petros Drineas
چکیده

The optimal method to be used for tSNP selection, the applicability of a reference LD map to unassayed populations, and the scalability of these methods to genome-wide analysis, all remain subjects of debate. We propose novel, scalable matrix algorithms that address these issues and we evaluate them on genotypic data from 38 populations and four genomic regions (248 SNPs typed for approximately 2000 individuals). We also evaluate these algorithms on a second data set consisting of genotypes available from the HapMap database (1336 SNPs for four populations) over the same genomic regions. Furthermore, we test these methods in the setting of a real association study using a publicly available family data set. The algorithms we use for tSNP selection and unassayed SNP reconstruction do not require haplotype inference and they are, in principle, scalable even to genome-wide analysis. Moreover, they are greedy variants of recently developed matrix algorithms with provable performance guarantees. Using a small set of carefully selected tSNPs, we achieve very good reconstruction accuracy of "untyped" genotypes for most of the populations studied. Additionally, we demonstrate in a quantitative manner that the chosen tSNPs exhibit substantial transferability, both within and across different geographic regions. Finally, we show that reconstruction can be applied to retrieve significant SNP associations with disease, with important genotyping savings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding haplotype tagging SNPs by use of principal components analysis.

The immense volume and rapid growth of human genomic data, especially single nucleotide polymorphisms (SNPs), present special challenges for both biomedical researchers and automatic algorithms. One such challenge is to select an optimal subset of SNPs, commonly referred as "haplotype tagging SNPs" (htSNPs), to capture most of the haplotype diversity of each haplotype block or gene-specific reg...

متن کامل

MLR-tagging: informative SNP selection for unphased genotypes based on multiple linear regression

UNLABELLED The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. For these studies, it is essential to use a small subset of informative SNPs accurately representing the rest of the SNPs. Informative SNP selection can achieve (1) considerable budget savings by genotyping only a limited number of SN...

متن کامل

A comparison of tagging methods and their tagging space.

Single-nucleotide polymorphism (SNP) tagging is widely used as a way of saving genotyping costs in association studies. A number of different tagging methods have been developed to reduce the number of markers to be genotyped while maintaining power for detecting effects on non-assayed SNPs. How the different methods perform in different settings, the degree to which they overlap and share comm...

متن کامل

BNTagger: improved tagging SNP selection using Bayesian networks

Genetic variation analysis holds much promise as a basis for disease-gene association. However, due to the tremendous number of candidate single nucleotide polymorphisms (SNPs), there is a clear need to expedite genotyping by selecting and considering only a subset of all SNPs. This process is known as tagging SNP selection. Several methods for tagging SNP selection have been proposed, and have...

متن کامل

Selecting tagging SNPs for association studies using power calculations from genotype data.

Recent studies have indicated that linkage disequilibrium (LD) between single nucleotide polymorphism (SNP) markers can be used to derive a reduced set of tagging SNPs (tSNPs) for genetic association studies. Previous strategies for identifying tSNPs have focused on LD measures or haplotype diversity, but the statistical power to detect disease-associated variants using tSNPs in genetic studies...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome research

دوره 17 1  شماره 

صفحات  -

تاریخ انتشار 2007